We use cookies to improve your experience. By continuing to browse this site, you accept our cookie policy.×

A genetic programming-based approach to identify potential inhibitors of serine protease of Mycobacterium tuberculosis

    Madhulata Kumari

    *Author for correspondence:

    E-mail Address: mchandra724@gmail.com

    Department of Information Technology, Kumaun University, S. S. J. Campus, Almora, Uttarakhand 263601, India

    ,
    Neeraj Tiwari

    Department of Statistics, Kumaun University, S. S. J. Campus, Almora, Uttarakhand 263601, India

    &
    Naidu Subbarao

    School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India

    Published Online:https://doi.org/10.4155/fmc-2018-0560

    Aim: We applied genetic programming approaches to understand the impact of descriptors on inhibitory effects of serine protease inhibitors of Mycobacterium tuberculosis (Mtb) and the discovery of new inhibitors as drug candidates. Materials & methods: The experimental dataset of serine protease inhibitors of Mtb descriptors was optimized by genetic algorithm (GA) along with the correlation-based feature selection (CFS) in order to develop predictive models using machine-learning algorithms. The best model was deployed on a library of 918 phytochemical compounds to screen potential serine protease inhibitors of Mtb. Quality and performance of the predictive models were evaluated using various standard statistical parameters. Result: The best random forest model with CFS-GA screened 126 anti-tubercular agents out of 918 phytochemical compounds. Also, genetic programing symbolic classification method is optimized descriptors and developed an equation for mathematical models. Conclusion: The use of CFS-GA with random forest-enhanced classification accuracy and predicted new serine protease inhibitors of Mtb, which can be used for better drug development against tuberculosis.

    Papers of special note have been highlighted as: •• of considerable interest

    References

    • 1. Upadhye V, Majumdar A, Gomashe A et al. Inhibition of Mycobacterium tuberculosis secretory serine protease blocks bacterial multiplication both in axenic culture and in human macrophages. Scand. J. Infect. Dis. 41(8), 569–576 (2009).
    • 2. Vandal OH, Pierini LM, Schnappinger D, Nathan CF, Ehrt S. A membrane protein preserves intrabacterial pH in intraphagosomal Mycobacterium tuberculosis. Nat. Med. 14(8), 849–854 (2008). •• Demonstrates that serine protease Mycobacterium tuberculosis plays a critical role in virulence and survival.
    • 3. Vandal OH, Nathan CF, Ehrt S. Acid resistance in Mycobacterium tuberculosis. J. Bacteriol. 191(15), 4714–4721 (2009). •• Demonstrates that serine protease M. tuberculosis plays a critical role in virulence and survival.
    • 4. Koza J. Genetic programming: on the programming of computers by means of natural selection. Stat. Comput. 4(2), 87–112 (1994).
    • 5. Francesco A, Ilaria G, Leonardo V. Genetic programming for QSAR investigation of docking energy. Appl. Soft. Comput. 10, 170–182 (2010).
    • 6. Chung KK, Do DQ. Modelling the effect of structural QSAR parameters on skin penetration using genetic programming. Adv. Nat. Sci. 1, 035003–035010 (2010).
    • 7. Masamoto A, Kiyoshi H, Kimito F. QSAR study of anti-HIV HEPT analogues based on multi-objective genetic programming and counter-propagation neural network. Chemometr. Intell. Lab. Syst. 83(2), 91–98 (2006).
    • 8. Langdon WB, Barrett SJ. Genetic programming in data mining for drug discovery. Evol. Comput. Data Min. 10(163), 211–235 (2004).
    • 9. Silva S, Vanneschi L. Bloat free genetic programming: application to human oral bioavailability prediction. Int. J. Data Min. Bioinform. 6(6), 585–601 (2012).
    • 10. Venkatraman V, Dalby AR, Yang ZR. Evaluation of mutual information and genetic programming for feature selection in QSAR. J. Chem. Inf. Comput. Sci. 44(5), 1686–1692 (2004).
    • 11. Yang ZR, Dalby AR, Qiu J. Mining HIV protease cleavage data using genetic programming with a sum-product function. Bioinformatics 20(18), 3398–3405 (2004).
    • 12. Thrun S. Learning to play the game of chess. In: Advances in Neural Information Processing Systems MIT Press, MA, USA (1995).
    • 13. Holland JH. Adaptation in Natural and Artificial Systems. University of Michigan Press, MI, USA (1975).
    • 14. Judson R. Genetic algorithms and their use in chemistry. In: Reviews in Computational Chemistry. Lipkowitz BKBoyd BD (Eds). Wiley, New York, USA, 10, 1–73 (1997).
    • 15. Devillers J. Principles of QSAR and drug design. In: Genetic Algorithms in Molecular Modeling (Volume 1). Academic Press, Harcourt Brace & Company, NY, USA (1996).
    • 16. Ghiselli EE. Theory of Psychological Measurement. McGraw-Hill, NY, USA (1964).
    • 17. Breiman L. Random forests. Mach. Learn. 45, 5–32 (2001).
    • 18. Quinlan JR. Induction of decision trees. Mach. Learn. 1, 81–106 (1986).
    • 19. Friedman N, Geiger D, GoldSzmidt M. Bayesian network classifiers. Mach. Learn. 29, 131–163 (1997).
    • 20. Platt J. Sequential minimal optimization: a fast algorithm for training support vector machines. Microsoft Res. MSR-TR-98-14, 21 (1998).
    • 21. National Center for Biotechnology Information. Fluorescence polarization-based biochemical high throughput confirmation assay for inhibitors of the membrane-associated serine protease Rv3671c in M. tuberculosis. http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=2761
    • 22. Gasteiger J, Rudolph C, Sadowski J. Automatic generation of 3D-atomic coordinates for organic molecules. Tetrahedron Comput. Methodol. 3, 537–547 (1990).
    • 23. Liu K, Feng J, Young SS. PowerMV: a software environment for molecular viewing, descriptor generation, data analysis and hit evaluation. J. Chem. Inf. Model. 45(2), 515–522 (2005).
    • 24. Ghiselli EE. Theory of Psychological Measurement. McGraw Hill, NY, USA (1964).
    • 25. Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. Proc. 14th Int. Conf. 2(12), 1137–1145 (1995).
    • 26. Stehman SV. Selecting and interpreting measures of thematic classification accuracy. Remote Sens. Environ. 62, 77–89 (1997).
    • 27. Demsar J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006).
    • 28. Sokolova M, Lapalme G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45, 427–437 (2009).
    • 29. Fawcett T. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–887 (2006).
    • 30. Kumari M, Chandra S. In-silico prediction of anti-malarial hit molecules based on machine learning methods. Int. J. Comput. Biol. Drug Des. 8(1), 40–53 (2015).